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Summary 


In  this  paper  we  investigate  the  forecasting  ability  of  feedforward  and  recurrent  neural 
networks  based  on  empirical  foreign  exchange  rate  data.  A  two-step  procedure  is  proposed 
to  construct  suitable  networks,  in  which  networks  are  selected  based  on  the  predictive 
stochastic  complexity  (PSC)  criterion,  and  the  selected  networks  are  estimated  using  both 
recursive  Newton  algorithms  and  the  method  of  nonlinear  least  squares.  We  find  that 
PSC  is  a  sensible  criterion  for  selecting  networks  and  that  the  out-of-sample  performance 
of  neural  networks  is  reasonably  good.  In  particular,  the  networks  selected  based  on 
PSC  have  rather  satisfactory  out-of-sample  sign  prediction  results,  in  contrast  with  some 
commonly  used  ARMA  models. 


1      Introduction 

Neural  networks  provide  a  general  class  of  nonlinear  models  which  has  been  successfully 
applied  in  many  different  fields.  Numerous  empirical  and  computational  applications  can 
be  found  in  the  Proceedings  of  the  International  Joint  Conference  on  Neural  Networks  and 
Conference  of  Neural  Information  Processing  Systems.  In  spite  of  its  success  in  various 
fields,  there  are  only  a  few  applications  of  neural  networks  in  economics.  Neural  networks 
are  novel  in  econometric  applications  in  the  following  two  respects.  First,  the  class  of  multi- 
layer neural  networks  can  well  approximate  a  large  class  of  functions  (Hornik,  Stinchcombe, 
and  White  (1989)  and  Cybenko  (1989)),  whereas  most  of  commonly  used  nonlinear  time- 
series  models  do  not  have  this  property.  Second,  as  shown  in  Barron  (1991),  neural 
networks  are  more  parsimonious  models  than  linear  subspace  methods  such  as  polynomial, 
spline,  and  trigonometric  series  expansions  in  approximating  unknown  functions.  Thus, 
if  the  behavior  of  economic  variables  exhibits  nonlinearity,  a  suitably  constructed  neural 
network  can  serve  as  a  useful  tool  to  capture  such  regularity. 

In  this  paper  we  investigate  possible  nonlinear  patterns  in  foreign  exchange  data  using 
feedforward  and  recurrent  networks.  It  has  been  widely  accepted  that  foreign  exchange 
rates  are  1(1)  (integrated  of  order  one)  processes  and  that  changes  of  exchange  rates  are 
uncorrelated  over  time.  Hence,  changes  in  exchange  rates  are  not  linearly  predictable  in 
general.  For  a  comprehensive  review  of  these  issues,  see  Baillie  and  McMahon  (1989). 
Since  the  empirical  studies  supporting  these  conclusions  rely  mainly  on  linear  time  series 
techniques,  it  is  not  unreasonable  to  conjecture  that  the  linear  unpredictability  of  ex- 
change rates  may  be  due  to  limitations  of  linear  models.  Hsieh  (1989)  finds  that  changes 
of  exchange  rates  may  be  nonlinearly  dependent,  even  though  they  are  linearly  uncorre- 
lated. Some  researchers  also  provide  evidence  in  favor  of  nonlinear  forecasts,  e.g.,  Tay- 
lor (1980,1982),  Engel  and  Hamilton  (1990),  Engel  (1991),  and  Chinn  (1991).  On  the 
other  hand,  Diebold  and  Nason  (1990)  find  that  nonlinearities  of  exchange  rates,  if  any, 
cannot  be  exploited  to  improve  forecasting.  Therefore,  we  treat  neural  networks  as  al- 
ternative nonlinear  models  and  focus  on  whether  neural  networks  can  provide  superior 
out-of-sample  forecasts. 


This  paper  has  two  objectives.  First,  we  introduce  different  neural  network  modeling 
techniques  and  propose  a  two-step  procedure  to  construct  suitable  neural  networks.  Sec- 
ond, we  evaluate  the  performance  of  networks  obtained  from  the  proposed  procedure  in 
terms  of  out-of-sample  MSE  (mean  squared  errors)  and  sign  predictions  (i.e.,  forecasts  of 
the  direction  of  future  changes).  In  the  first  step  of  the  proposed  procedure,  we  apply 
recursive  Newton  algorithms  to  estimate  networks  and  compute  the  so-called  "predic- 
tive stochastic  complexity"  (Rissanen  (1987)),  from  which  we  can  easily  select  suitable 
networks.  In  the  second  step,  statistically  more  efficient  estimates  are  obtained  by  the 
method  of  nonlinear  least  squares  using  recursive  estimates  from  the  first  step  as  initial 
values.  Our  procedure  differs  from  previous  appHcations  of  feedforward  networks  in  eco- 
nomics, e.g..  White  (1988)  and  Kuan  and  White  (1990),  in  that  networks  are  selected 
objectively.  Also,  the  application  of  recurrent  networks  is  new  in  applied  econometrics; 
hence  its  performance  should  also  be  of  interest  to  researchers.  Our  results  show  that 
predictive  stochastic  complexity  is  a  sensible  criterion  for  selecting  networks  and  that  the 
resulting  networks  perform  reasonably  well  in  different  out-of-sample  periods.  In  partic- 
ular, the  selected  networks  yield  quite  satisfactory  out-of-sample  sign  predictions  which 
are  significantly  better  than  predictions  based  on  tossing  a  coin.  This  result  is  in  contrast 
with  that  of  some  commonly  used  ARMA  models. 

This  paper  proceeds  as  follows.  We  review  various  network  architectures  and  estima- 
tion methods  in  section  2.  The  network  construction  procedures  are  described  in  section 
3.  Empirical  results  are  analyzed  in  section  4.  Section  5  concludes  the  paper. 

2      Feedforward  and  Recurrent  Networks 

In  this  section  we  briefly  describe  feedforward  and  recurrent  networks  and  associated 
estimation  methods.  For  more  details  see  Kuan  and  White  (1993a). 

2.1      Network  Functional  Forms 

A  neural  network  may  be  interpreted  as  a  nonlinear  regression  function  characterizing  the 
relationship  between  the  dependent  variable  (target)  y  and  an  n- vector  of  explanatory 


variables  (inputs)  x.  Instead  of  postulating  a  specific  nonlinear  function,  a  neural  network 
model  is  constructed  by  combining  many  "basic"  nonlinear  functions  via  a  multi-layer 
structure.  In  a,  feedforward  network,  the  explanatory  variables  first  simultaneously  activate 
q  hidden  units  in  an  intermediate  layer  through  some  function  ^,  and  the  resulting  hidden- 
unit  activations  A,,  i  =  l,---,q,  then  activate  output  units  through  some  function  $  to 
produce  the  network  output  o  (see  Figure  1).  Symbolically,  we  have 

n 

j=i 

ot     =     $(y3o  +  5]A/i..t),  (1) 

!=1 

or  more  compactly, 

q  n 

Ot     =     $ (/3o  +  E  /^'*('y'0  +  E  7.j^j,0) 
.=1  j=i 

=  :     fixud),  (2) 

where  9  is  the  vector  of  parameters  containing  all  /3's  and  7's.  This  is  a  flexible  nonlinear 
functional  form  in  that  the  activation  functions  ^  and  $  can  be  chosen  quite  arbitrarily, 
except  that  ^  is  generally  required  to  be  a  bounded  function.  Hornik,  Stinchcombe, 
and  White  (1989)  and  Cybenko  (1989)  show  that  the  function  /  constructed  in  (2)  can 
approximate  a  large  class  of  functions  arbitrarily  well  (in  a  suitable  metric),  provided  that 
the  number  of  hidden  units,  q,  is  sufficiently  large.  This  property  is  very  similar  to  that 
of  nonparametric  methods.  Barron  (1991)  also  shows  that  a  feedforward  network  can 
achieve  an  approximation  rate  0{l/q)  by  using  a  number  of  parameters  0{qn)  that  grows 
linearly  in  g,  whereas  traditional  polynomial,  spline,  and  trigonometric  expansions  require 
exponentially  0{q^)  terms  to  achieve  the  same  approximation  rate.  Thus,  neural  networks 
are  relatively  more  parsimonious  than  these  series  expansion  in  approximating  unknown 
functions.  These  two  properties  make  feedforward  networks  an  attractive  econometric  tool 
in  (nonparametric)  applications. 

[  Figure  1  About  Here  ] 

In  a  dynamic  context,  it  is  natural  to  include  lagged  dependent  variables  as  explana- 
tory variables  in  a  feedforward  network  to  capture  dynamics.   This  approach  suffers  the 


drawback  that  the  correct  number  of  lags  needed  is  typically  unknown  (this  is  analogous 
to  the  problem  of  determining  the  order  of  an  autoregression).  Hence,  the  lagged  depen- 
dent variables  in  a  network  may  not  be  enough  to  characterize  the  behavior  of  y  in  some 
applications.  To  overcome  this  deficiency,  various  recurrent  networks,  i.e.,  networks  with 
feedbacks,  have  been  proposed.  A  recurrent  network  has  a  richer  dynamic  structure  and  is 
similar  to  a  linear  time-series  model  with  moving  average  terms.  In  particular,  we  consider 
the  following  network  due  to  Elman  (1990)  (see  Figure  2): 

n  q 

hi,t     =     *(7.o  +  ^  lij^j,t  +  J2  ^^ehe,t-i) 
=  •■     M^t^f^t-i.O),         i  =  l,---,9, 

1=1 
=:     (f){xt,ht-i,d),  (3) 

where  9  denotes  the  vector  of  parameters  containing  all  /3's,  7's,  and  ^'s.  Here,  the  hidden- 
unit  activations  /i,  feed  back  to  the  input  layer  with  delay  and  serve  to  "memorize"  the 
past  information,  cf.  (1).  From  (3)  we  can  write,  by  recursive  substitution, 

ht^t  =  '(pi{xt,^pi{xt-i,ht-2,0),e)  =  ■■■  =:r,{x\e),  i  =!,•••, 9,  (4) 

where  x'  =  {xt.,Xt-i,-  •  ■  ,xi).  Hence,  h^^t  depends  on  Xt  and  its  entire  history.  It  follows 
that 

Ot  =  <t>{xi,ht-i,e)  =:  g{x\d)  (5) 

is  also  a  function  of  Xt  and  its  entire  history,  cf.  (2).  In  view  of  (5),  we  expect  that  a 
recurrent  network  may  capture  more  dynamic  characteristics  of  yt  than  does  a  feedforward 
network. 


[  Figure  2  About  Here  ] 


2.2     Estimation  Methods 


Given  a  dependent  variable  y  and  a  feedforward  network  (2)  with  explanatory  variables 
x,  we  want  to  find  suitable  parameters  6*  minimizing 

E\y-  f{xM^  =  E\y-  Eiy\x)\^  +  E  \E(y\x)  -  fix,e)\\  (6) 


This  is  equivalent  to  minimizing  E  \E{y\x)  —  f{x,9)\^.  That  is,  we  want  to  use  the  feedfor- 
ward network  to  approximate  the  unknown  conditional  mean  function  and  minimize  the 
resulting  squared  approximation  errors.  Since  E{y\x)  is  the  best  Z/2-predictor  of  y  given  x, 
the  network  output  ot  =  f{xt,6*)  should  match  t/t  fairly  closely,  at  least  in  the  L2  sense. 
In  view  of  (6),  the  unknown  parameters  can  be  estimated  using  the  method  of  Nonlinear 
Least  Squares  (NLS).  Alternatively,  recursive  estimation  methods  may  be  used.  Although 
recursive  estimation  is  important  for  adaptive  learning  and  on-line  signal  processing,  it  is 
well  known  that  recursive  algorithms  do  not  utilize  the  data  efficiently  in  finite  samples. 
However,  recursive  estimation  can  provide  useful  starting  values  for  the  NLS  estimator 
and  facilitate  network  selection  (see  discussions  in  Section  3).  Specifically,  we  consider  the 
following  stochastic  Newton  algorithm: 

Ot+i    =    Ot  +  mGT'Vf{xuet)[yt-f{xtA)]. 

Gt+1  =  Gt  +  iit[vf{xt,et)Vf{xt,ety-Gt],  (7) 

where  V/(x,  6)  is  the  (column)  gradient  vector  of  /  with  respect  to  0  and  {rjt}  is  a  sequence 
of  learning  rates  of  order  1/t.  Note  that  Vf{x,9)[y  —  f(x,9)]  is  the  vector  of  the  first-order 
derivatives  of  the  squared-error  loss:  [y  —  f{x,9)]^  and  that  the  second  updating  equation 
recursively  estimates  an  approximate  Newton  direction.  Thus,  the  algorithm  (7)  perform 
a  recursive  Newton  search  in  the  parameter  space.  Kuan  and  White  (1993a)  show  that  the 
estimates  of  (7)  are  root-T  consistent  and  asymptotically  equivalent  to  the  NLS  estimator 
under  very  general  conditions.  In  practice,  an  algebraically  equivalent  form  of  (7)  which 
does  not  involve  matrix  inversion  can  be  used  to  simplify  computation,  see  Kuan  and 
White  (1993a).  We  also  note  that  if  /  is  a  linear  function,  the  algorithm  (7)  reduces  to 
the  well-known  recursive  least  square  algorithm,  see  e.g.,  Ljung  and  Soderstrom  (1983). 

Similarly,  the  parameters  of  interest  of  a  recurrent  network  are  9*  that  minimize 

E\yt-g{x\9)\\ 

Here,  g[x^,9*)  can  be  viewed  as  an  approximation  of  E{yt\x^).  In  view  of  (4)  and  (5), 
ht  and  Ot  depend  on  9  directly  and  indirectly  through  the  presence  of  lagged  hidden- 
unit  activations  /it_i;  hence  both  r  and  g  are  complex  functions  of  9.    In  particular,  in 


calculating  the  derivatives  of  g  with  respect  to  6,  parameter  dependence  of  ht-\  must  be 
taken  into  account.  Owing  to  this  "state  dependent"  structure,  it  is  difficult  to  implement 
the  method  of  NLS,  and  the  algorithm  (7)  is  invalid. 

A  recurrent  Newton  algorithm  analogous  to  (7)  is 

et  =  yt  -  4>{xt,ht-\,9t)., 

Vet  =  -<l>eixt,ht-i,9t)  -  At(f>h{3:t,ht-iJt), 

0t+i  =  et-r]tG-[^Vetet. 

Gt+i  =  Gt  +  r]t{^etVe[-Gt),  (8) 

where  the  i-th  (i  =  1, ...  ,7)  hidden-unit  activation  is  updated  according  to 

n  q 

hi,t   =   ^{iio,t  +  ^ltj,tXj,t  +  J2^'^'i'^^^t-i)    =   Mxt,'ht-iJt),  (9) 

the  j-ih  {j  =  1,. . .  ,q)  column  of  At+\  is  updated  according  to 

Aj,t+i   =   4}j,e{xt,'ht-u9t)  +  ^til^j,hixt,'ht-iJt),  (10) 

and  the  initial  values  6q,  Kq,  and  Aq  are  chosen  arbitrarily.  Here,  4>g  and  4>h  are  column 
vectors  of  the  first  order  derivatives  of  (j>  with  respect  to  9  and  h,  respectively,  and  xjj-j^g 
and  tpj^h.  a^re  column  vectors  of  the  first  order  derivatives  of  the  j-t\i  hidden  unit  ipj  with 
respect  to  9  and  h,  respectively.  The  recurrent  Newton  algorithms  differs  from  (7)  in  that 
updating  equations  (9)  and  (10)  allow  us  to  update  the  dht-i/d9  term  recursively.  Clearly, 
a  recurrent  network  not  depending  on  ht-i  is  a  feedforward  network.  In  this  case,  the  0/i 
term  is  zero  so  that  the  updating  equations  of  A^  are  not  needed,  and  (8)  simply  reduces 
to  the  standard  Newton  algorithm  (7).  In  view  of  the  first  equation  in  (3),  we  can  see  that 
certain  constraints  must  be  imposed  to  prevent  h  from  being  "explosive".  Kuan  (1993) 
shows  that  the  recurrent  Newton  algorithm  is  strongly  consistent,  provided  that  \Si£\  <  4/q 
for  all  i  and  £,  where  q  is  the  number  of  hidden  units,  and  is  computationally  more  efficient 
than  the  "recurrent  back-propagation"  algorithm  of  Kuan,  Hornik,  and  White  (1993);  see 
also  Kuan  and  White  (1993b). 


3      Network  Construction 

In  this  paper,  we  choose  the  activation  functions  ^  as  the  logistic  function  and  $  as  the 
identity  function  in  the  networks  (1)  and  (3).  These  choices  are  quite  standard  in  the 
neural  network  literature.  Our  dependent  variables  are  changes  of  log  exchange  rates, 
and  for  each  exchange  rate,  networks  are  constructed  using  lagged  dependent  variables  as 
explanatory  variables.  The  resulting  networks  are  therefore  nonlinear  AR  models. 

A  difficult  problem  in  network  construction  is  to  determine  network  complexity.  This 
involves  the  determination  of  the  number  of  lagged  dependent  variables  and  the  number 
of  hidden  units.  A  very  simple  network  may  not  be  able  to  approximate  the  unknown  con- 
ditional mean  function  well;  an  excessively  complex  network  may  over  fit  the  data.  There 
is,  however,  no  definite  conclusion  regarding  the  determination  of  network  complexity. 
As  neural  network  models  are,  by  construction,  some  approximating  functions,  it  is  our 
opinion  that  the  determination  of  network  complexity  is  a  model  selection  problem.  One 
possible  criterion  is  the  Schwarz  (1978)  Information  Criterion  (SIC).  Rissanen  (1983,1984) 
show  that  this  criterion  can  be  applied  to  a  more  general  setting  than  linear  models;  in 
particular,  the  SIC  is  asymptotically  equivalent  to  stochastic  complexity  of  a  model  (Ris- 
sanen (1987)).  Note,  however,  that  selecting  networks  based  on  SIC  is  computationally 
demanding  because  NLS  is  required  for  estimating  every  possible  network. 

An  alternative  criterion  to  regularize  network  complexity  is  the  "Predictive  Stochastic 
Complexity"  (PSC)  criterion  due  to  Rissanen  (1986a,b);  see  also  Rissanen  (1987).  Given 
a  function  m(x,^),  where  ^  is  a  /:- dimensional  parameter  vector,  and  a  sample  of  T  ob- 
servations, PSC  is  computed  as  the  average  of  squared,  "honest",  prediction  errors: 
T 

-i-    ^    {yt  -  m{xtJt))\  (11) 

^  ~  ^  t=k+i 

where  9t  is  the  predicted  parameter  obtained  from  the  data  up  to  time  t  —  1.  The  prediction 
error  yt  —  m(xt,6t)  is  "honest"  in  the  sense  that  no  information  at  time  t  or  beyond  is  used 
to  calculate  6t.  A  particular  model  is  selected  if  it  has  the  smallest  PSC  within  a  class  of 
models.  If  two  models  have  the  same  PSC,  the  simpler  one  is  selected.  Clearly,  the  PSC 
criterion  is  based  on  forward  validation,  which  is  particularly  important  in  forecasting. 


Rissanen  also  shows  that  for  encoding  a  sequence  of  numbers,  the  PSC  criterion  can 
determine  the  code  with  the  shortest  code  length  asymptotically.  For  a  thorough  discussion 
of  the  notion  of  stochastic  complexity  we  refer  to  Rissanen  (1989).  Obviously,  calculation 
of  PSC  is  also  computationally  demaiiding  if  NLS  is  required  to  estimate  dt  at  each  t. 
Following  the  idea  of  Gerencser  and  Rissanen  (1992),  we  can  compute  $t  using  recursive 
estimation  methods,  which  are  more  tractable  computationally.  Clearly,  both  (7)  and  (8) 
can  be  easily  applied  to  compute  PSC. 

We  therefore  adopt  the  following  two-step  procedure  to  construct  suitable  networks. 

1.  Recursive  estimation.  A  family  of  networks  with  different  numbers  of  explanatory 
variables  and  hidden  units  is  estimated  using  the  stochastic  Newton  algorithm  (7) 
or  the  recurrent  Newton  algorithm  (8). 

(a)  Ten  sets  of  initial  parameters  are  generated  randomly  from  A'^(0, 1),  and  the 
one  that  results  in  the  lowest  MSE  is  used  as  the  initial  values  for  recursive 
algorithms. 

(b)  We  let  the  algorithm  run  through  the  data  set  once  and  compute  the  PSC 
values.  The  three  best  networks  according  to  the  PSC  values  are  selected. 

2.  NLS  estimation.  The  FORTRAN  subroutine  LMDER  in  MINPACK^  is  used. 

(a)  For  selected  feedforward  networks,  the  final  recursive  estimates  are  used  as 
initial  values  of  the  NLS  estimator  for  9. 

(b)  For  selected  recurrent  networks,  we  fix  the  recurrent  parameters,  ^'s,  at  the 
final  recursive  estimates  and  use  the  rest  of  the  recursive  estimates  as  initial 
values  of  the  NLS  estimator  for  the  parameters  /?'s  and  7's. 

In  the  proposed  procedure,  both  recursive  and  NLS  estimations  are  used.  Recursive  esti- 
mation facilitates  network  selection  because  PSC  can  be  easily  computed  using  the  Newton 


^MINPACK  is  a  collection  of  FORTRAN  subroutines  from  Argonne  National  Laboratory,  and  LMDER 
is  one  of  its  NLS  subroutines.  LMDER  is  based  on  a  modification  of  the  Levenberg-Marquardt  algorithm; 
details  of  this  algorithm  can  be  found  in  More  (1977). 


algorithms  and  is  particularly  important  for  recurrent  networks.  Moreover,  the  recursive 
estimates  may  provide  useful  starting  values  of  the  NLS  estimator  in  the  next  step.  NLS 
estimation  in  the  second  step  is  used  to  improve  efficiency  of  parameter  estimates.  This 
two-step  estimation  is  analogous  to  that  of  White  (1989).  Note  that  the  parameters  6's  in 
recurrent  networks  are  fixed  in  the  second  step  to  avoid  constraint  minimization.  (Recall 
that  ^'s  must  be  constrained  suitably  to  ensure  proper  convergence  behavior.)  Hence, 
the  second  step  for  recurrent  network  construction  is  analogous  to  building  a  partially 
hard- wired  recurrent  network  (Kuan  and  Hornik  (1991)). 

4     Empirical  Results 

In  this  paper  five  exchange  rates  against  the  U.S.  dollar,  including  British  Pound  (BP), 
Canadian  Dollar  (CD),  Deutsche  Mark  (DM),  Japanese  Yen  (JY),  and  Swiss  Franc  (SF), 
are  investigated.  The  data  are  daily  opening  bid  prices  of  the  NY  Foreign  Exchange 
Market  from  March  1,  1980  to  January  28,  1985,  consisting  of  1245  observations.  All 
series  except  BP  are  US  dollars  per  unit  of  foreign  currency.  This  data  set  has  also  been 
used  in  Baillie  and  BoUerslev  (1989).  Let  5,,(  denote  the  i-th  exchange  rate  at  time  t, 
and  yi^t  =  log5,,f  —  log5,,f_i,  i  =  BP,  CD,  DM,  JY,  SF.  By  applying  various  unit-root 
tests,  Baillie  and  BoUerslev  (1989)  find  that  log5,,(  are  unit  root  processes  without  drift 
and  that  r/,^t  behave  like  a  martingale  difference  sequence.  We  also  estimated  thirty  six 
ARM  A  models  for  y,,t  from  ARMA(0,0)  to  ARMA(5,5)  and  found  that  ARMA(0,0)  is  the 
best  model  for  all  five  series  in  terms  of  the  SIC  values.  This  is  consistent  with  the  results 
of  Baillie  and  BoUerslev  (1989).  In  what  follows,  we  will  abuse  terminology  and  refer  to 
ARMA(0,0)  as  the  random  walk  model. 

To  evaluate  the  forecasting  performance  of  different  models  of  y,,i,  we  reserve  the  last 
50,  100,  and  150  observations  as  out-of-sample  periods  and  estimate  models  using  1194, 
1144,  and  1094  observations,  respectively.  These  choices  are  arbitrary.  Of  particular 
interest  to  us  is  whether  a  model  can  outperform  the  random  walk  model  in  terms  of  out- 
of-sample  MSE.  We  apply  the  Mizrach  (1992)  test^  to  check  whether  the  out-of-sample 


These  tests  are  computed  based  on  the  program  provided  by  Prof.  Mizrach.  In  our  computation,  models 


MSE  of  network  models  are  significantly  different  from  those  of  the  random  walk  model.  In 
addition,  we  evaluate  out-of-sample  sign  predictions  of  y,,f  Sign  prediction  gives  forecasts 
of  the  direction  of  future  changes  and  yields  important  information  in  financial  forecasting. 
In  an  extreme  case,  a  model  could  have  small  out-of-sample  MSE  but  predict  all  the  signs 
incorrectly,  and  hence  be  virtually  useless.  We  expect  a  good  model  to  have  the  proportion 
of  correct  sign  predictions  (in  out-of-sample  period)  significantly  better  than  0.5,  i.e.,  better 
than  predictions  based  on  tossing  a  coin.  Thus,  taking  the  null  hypothesis  as  p  =  0.5,  the 
following  test  statistic  is  used: 


V^iz-p)/y/p{l^}   =   y^(-z-0.5)/0.5  -''  iV(0,l), 


where  z  is  the  proportion  of  correct  sign  predictions  and  n  is  the  number  of  observations 
in  an  out-of-sample  period.  (We  thank  a  referee  for  this  suggestion.)  For  a  one-sided  test 
and  n  =  50,  100,  and  150,  it  is  easily  verified  that  at  the  5%  level,  z  >  .616,  .583,  and  .567 
is  significant,  and  that  at  the  10%  level,  z  >  .591,  .564,  and  .552  is  significant. 

Neural  network  models  are  constructed  according  to  the  two-step  procedure  described 
in  Section  3.  Note  that  for  each  series,  the  network  explanatory  variables  are  lagged 
dependent  variables'^.  In  the  first  step,  thirty  feedforward  and  recurrent  networks  (with  1- 
6  lagged  ?/,,t  and  2-6  hidden  units)  are  estimated  using  the  recursive  Newton  algorithms, 
and  the  three  networks  with  best  PSC  values  are  selected.  In  the  second  step,  the  selected 
networks  are  further  "smoothed"  using  the  method  of  NLS.  (We  omit  networks  with 
one  hidden  unit  because  they  are  not  practically  interesting.)  Out-of-sample  forecasting 
results  from  recursive  and  NLS  estimation  are  summarized  in  Tables  1-5,  where  we  write 
the  network  with  L  lags  and  H  hidden  units  as  the  network  [L,H).  Ideally,  we  can 
construct  a  multiple  -output  network  for  all  5  series,  analogous  to  a  multivariate  nonlinear 
regression  model.   A  program  implementing  multiple-output  networks  is  currently  under 


with  MSE  smaller  than  the  random  walk  model  have  positive  statistics.    As  the  limiting  distribution  of 

this  statistic  is  ^'^(0, 1),  the  critical  Vcilues  of  an  one-sided  test  at  5%  and  10%  levels  are  1.645  and  1.282, 

respectively. 

^We  have  also  constructed  networks  for  each  yi^t  using  lagged  j/j,j,  j  ^  i,  as  additional  explanatory 

variables.    The  results  are  not  particularly  exciting.    We  therefore  confine  ourselves  to  networks  of  the 

present  form  which,  as  we  have  mentioned,  are  simply  nonlinear  AR  models. 
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development. 

[  Tables  1-5  About  Here  ] 

We  first  observe  that  a  wide  variety  of  networks  have  been  selected  and  that  for  each 
series,  selected  networks  have  quite  similar  structures.  In  particular,  there  is  at  least  one 
common  (feedforward  or  recurrent)  network  selected  in  three  (or  two)  in-sample  periods. 
These  common  networks  are'*: 

1.  BP:  feedforward  (5,3);  recurrent  (1,2)  and  (6,2). 

2.  CD:  feedforward  (1,4);  recurrent  (1,2)  and  (2,2). 

3.  DM:  feedforward  (2,2);  recurrent  (1,2)  and  (4,2). 

4.  JY:  feedforward  (1,3);  recurrent  (1,3). 

5.  SF:  feedforward  (2,2);  recurrent  (1,2)  and  (1,4). 

Note  that  the  structures  of  these  common  networks  are  not  very  complex.  These  results 
seem  to  suggest  that  there  exists  only  mild  nonlinearity  in  these  series. 

We  also  observe  the  following. 

1.  In  terms  of  out-of-sample  MSE: 

(a)  All  selected  networks  are  not  significantly  better  (or  worse)  than  the  random 
walk  model. 

(b)  Selected  feedforward  and  recurrent  networks  do  not  dominate  each  other;  NLS 
forecasting  results  need  not  be  better  than  corresponding  recursive  results. 

(c)  Except  for  the  CD,  all  the  common  networks  listed  above  have  out-of-sample 
MSE  (from  recursive  results)  smaller  than  those  of  the  random  walk  model. 

2.  In  terms  of  out-of-sample  sign  predictions: 


For  the  JY  and  feedforward  networks  of  the  BP,  the  common  networks  listed  are  taken  from  the  periods 
with  100  and  150  test  observations. 
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(a)  Except  for  the  CD,  many  selected  networks  have  correct  sign  predictions  about 
60%  and  are  significantly  better  than  tossing  a  coin. 

(b)  Selected  feedforward  and  recurrent  networks  do  not  dominate  each  other;  NLS 
sign  predictions  need  not  be  better  than  corresponding  recursive  results. 

(c)  Except  for  the  CD,  all  the  common  networks  have  out-of-sample  sign  predictions 
(from  recursive  results)  better  than  tossing  a  coin.  Most  of  these  recursive 
prediction  results  are  also  better  than  the  corresponding  NLS  results. 

These  results  show  that  the  PSC  criterion  is  a  quite  sensible  criterion  to  determine 
network  structure.  The  results  for  out-of-sample  MSE  suggest  that  the  "captured"  non- 
linearity  cannot  be  exploited  to  improve  forecasting  MSE.  The  results  for  sign  predictions 
seem  to  indicate  that  there  may  be  some  hope  of  predicting  the  directions  of  future  changes 
based  on  recursive  estimation  results.  In  fact,  obtaining  correct  sign  predictions  consis- 
tently about  60%  of  the  time  (or  more)  in  four  out  of  five  series  is  quite  encouraging. 
We  note  that  our  estimation  methods  are  based  on  MSE  minimization,  which  is  not  a 
loss  function  specific  for  sign  predictions.  It  would  be  interesting  to  construct  estimation 
methods  based  on  a  suitable  loss  function;  this  is  beyond  the  scope  of  this  paper,  how- 
ever. Although  the  performance  of  recurrent  networks  is  different  from  that  of  feedforward 
networks,  it  is  somewhat  surprising  to  us  that  recurrent  networks  do  not  outperform  feed- 
forward networks.  One  possible  interpretation  is  that  the  feedback  structure  in  recurrent 
networks  cannot  be  very  effective  when  there  is  very  little  correlation  across  the  dependent 
variables. 

For  the  sake  of  comparison,  we  also  evaluate  the  out-of-sample  performance  of  four 
commonly  used  ARMA  models^,  including  ARMA(1,0),  (0,1),  (1,1),  and  (2,2).  The  re- 
sults are  summarized  in  Table  6.  For  the  JY,  ARMA  models  have  out-of-sample  MSE 
significantly  better  than  those  of  the  random  walk  model  in  two  forecasting  periods  with 


^A  referee  points  out  that  a  more  comparable  way  is  to  select  and  estimate  ARMA  models  also  based 
on  the  proposed  two-step  procedure.  Selecting  ARMA  models  based  on  PSC  has  been  discussed  by,  e.g., 
Gerencser  (1990).  However,  implementing  the  two-step  procedure  is  more  involved,  see  e.g.,  Ljung  and 
Soderstrom  (1983).  As  our  emphasis  is  on  neural  network  models,  we  do  not  pursue  this  possibility. 
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100  and  150  test  observations,  but  this  dominance  disappears  in  the  period  with  50  obser- 
vations. For  the  CD,  ARMA  models  are  significantly  worse  than  the  random  walk  model 
in  the  forecasting  period  with  150  test  observations  but  become  significantly  better  in  the 
forecasting  period  with  50  observations.  For  other  series,  ARMA  models  forecasts  are  not 
significantly  different  from  those  of  the  random  walk  model.  We  also  observe  that  most 
of  the  correct  sign  predictions  of  ARMA  models  fluctuate  around  50%,  except  those  of 
ARMA(1,0)  and  (0,1)  for  the  BP. 

[  Table  6  About  Here  ] 

5      Conclusions 

In  this  paper  we  propose  a  two-step  procedure  to  estimate  and  select  feedforward  and 
recurrent  networks  and  carefully  evaluate  the  forecasting  performance  of  selected  networks 
in  different  out-of-sample  periods.  We  find  that  PSC  is  a  sensible  criterion  in  selecting 
networks.  Based  on  this  criterion,  it  is  possible  to  find  a  network  with  better  out-of- 
sample  MSE  and/or  sign  predictions,  compared  with  the  random  walk  model.  Hence,  the 
proposed  two-step  procedure  may  be  used  as  a  standard  network  construction  procedure  in 
other  applications.  Our  results  show  that  these  networks  are  not  significantly  better  than 
the  random  walk  model  in  terms  of  out-of-sample  MSE,  however.  Therefore,  we  confirm 
the  conclusion  of  Diebold  and  Nason  (1990)  that  nonlinearity  of  exchange  rates  may  not 
be  exploited  to  improve  point  forecasts.  If  we  are  not  so  ambitious  about  point  forecasts 
and  confine  ourselves  to  sign  predictions,  our  results  also  suggest  that  network  models 
perform  quite  well  for  this  purpose  in  four  out  of  five  series  we  investigated.  In  particular, 
selected  networks  have  sign  predictions  systematically  better  than  predictions  based  on 
coin  tossing.  This  is  an  interesting  direction  for  further  research.  On  the  other  hand, 
ARMA  models  may  have  out-of-sample  MSE  significantly  better  than  the  random  walk 
model,  but  their  correct  sign  predictions  are  typically  fluctuating  around  50%.  Finally, 
our  results  show  that  there  is  no  significant  difference  between  feedforward  and  recurrent 
networks  in  this  application. 
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Table  1.  Out-of-Sample  MSE  and  Sign  Predictions  from  Selected  Networks:  British  Pound. 


Network 
Type 

Test 
Obs. 

Selected 

Network 

Recursive  Result 

NLS  Result 

PSC 

MSE 

Sign 

MSE 

Sign 

(1,2) 

.4359 

.3566  (  .6101) 

64.0" 

.3657  (  .6006) 

62.0" 

50 

(1,4) 

.4363 

.3652  (  .6180) 

72.0" 

.3627  (  .5957) 

58.0 

(2,2) 

.4365 

.3656  (  .6324) 

72.0" 

.3822  (  .5506) 

72.0" 

(5,3) 

.4210 

.5575  (  .6114) 

62.0" 

.5915  (-.5283) 

54.0 

Feed- 

100 

(4,3) 

.4211 

.5956  (-.5645) 

59.0" 

.6130  (-.6062) 

40.0 

forward 

(6,2) 

.4211 

.5637  (  .5654) 

62.0" 

.5588  (  .5392) 

61.0" 

(5,3) 

.4242 

.4930  (  .5701) 

62.0" 

.5028  (-.4124) 

54.7 

150 

(4,3) 

.4244 

.5146  (-.5506) 

56.7" 

.5354  (-.6166) 

40.7 

(1,2) 

.4246 

.4859  (  .6279) 

59.3" 

.4820  (  .6418) 

59.3" 

(6,2) 

.4352 

.3672  (  .6069) 

66.0" 

.3714  (  .5666) 

58.0 

50 

(1,2) 

.4360 

.3661  (  .6325) 

72.0" 

.3708  (  .6153) 

72.0" 

(2,4) 

.4365 

.3597  (  .6290) 

72.0" 

.3697  (  .6083) 

68.0" 

(1,2) 

.4201 

.5631  (  .5699) 

60.0" 

.5568  (  .6146) 

61.0" 

Recurrent 

100 

(6,2) 

.4209 

.5632  (  .5656) 

61.0" 

.5712  (  .4300) 

59.0" 

(4,4) 

.4210 

.5839  (-.5410) 

55.0 

.6007  (-.5609) 

54.0 

(1,2) 

.4231 

.4930  (  .5868) 

58.7" 

.4872  (  .6268) 

59.3" 

150 

(6,2) 

.4236 

.4997  (  .4690) 

59.3" 

.5149  (-.5731) 

46.7 

(6,3) 

.4236 

.5041  (-.4623) 

58.7" 

.5220  (-.5688) 

58.7" 

Notes:  The  selected  networks  are  ordered  from  the  best  to  the  3rd  best,  according  to  the  PSC 
values.  "MSE"  stands  for  out-of-sample  MSE;  "Sign"  stands  for  the  proportions  of  correct  sign  pre- 
dictions in  out-of-sample  periods.  The  numbers  in  the  parentheses  in  MSE  columns  are  Mizrach's 
MSE-comparison  statistics.  For  sign  prediction  results,  *  and  **  stand  for  significance  at  10%  and 
5%  level,  respectively.  The  other  tables  follow  the  same  convention. 
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Table  2.  Out-of-Sample  MSE  and  Sign  Predictions  from  Selected  Networks:  Canadian  Dollars. 


Network 
Type 

Test 
Obs. 

Selected 
Network 

Recursive  Result 

NLS  Result 

PSC 

MSE 

Sign 

MSE 

Sign 

Feed- 
forward 

50 

(1,4) 
(1,5) 
(1,3) 

.6134 
.6152 
.6173 

.1882  (  .4814) 
.1888  (  .4674) 
.1885  (  .5030) 

54.0 
54.0 
56.0 

.1887  (  .4826) 
.1937  (-.5071) 
.1885  (  .4749) 

56.0 
56.0 
54.0 

100 

(1,4) 
(5,2) 
(2,2) 

.6218 
.6247 
.6253 

.3162  (-.5203) 
.3301  (-.6273) 
.3137  (-.4619) 

49.0 
44.0 
49.0 

.3134  (-.4740) 
.3514  (-.6163) 
.3085  (  .4830) 

52.0 
52.0 
53.0 

150 

(1,4) 
(2,2) 
(1,2) 

.6221 
.6251 
.6254 

.4167  (-.4417) 
.4190  (-.4917) 
.4199  (-.5248) 

49.3 
48.0 
47.3 

.4155  (  .4039) 
.4145  (  .4525) 
.4162  (-.3945) 

52.0 
50.0 
51.3 

Recurrent 

50 

(1,3) 
(2,2) 
(1,2) 

.6128 
.6148 
.6161 

.1880  (  .5009) 
.1883  (  .5422) 
.1860  (  .5546) 

56.0 
56.0 
56.0 

.1914  (-.4193) 
.1879  (  .5061) 
.1876  (  .5206) 

52.0 
56.0 
56.0 

100 

(2,2) 
(5,2) 
(1,2) 

.6243 
.6276 
.6277 

.3121  (-.4156) 
.3103  (  .4980) 
.3117  (  .3881) 

51.0 
52.0 
51.0 

.3114  (  .4339) 
.3083  (  .5606) 
.3111  (  .4568) 

51.0 
51.0 
52.0 

150 

(2,2) 
(1,2) 
(5,2) 

.6242 
.6276 
.6276 

.4175  (-.4941) 
.4198  (-.5073) 
.4158  (  .3523) 

49.3 
48.7 
50.0 

.4162  (-.4020) 
.4158  (  .3275) 
.4141  (  .4912) 

49.3 
50.7 
49.3 

Note:  PSC  and  MSE  are  the  numbers  in  the  table  x  10 


-1 
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Table  3.  Out-of-Sample  MSE  and  Sign  Predictions  from  Selected  Networks:  Deutsche  Mark. 


Network 
Type 

Test 
Obs. 

Selected 
Network 

Recursive  Result 

NLS  Result 

PSC 

MSE 

Sign 

MSE 

Sign 

(2,2) 

.4983 

.1989  (  .5805) 

62.0" 

.1895  (  .5592) 

52.0 

50 

(5,2) 

.4988 

.1994  (  .6063) 

60.0* 

.1999  (  .5554) 

64.0" 

(2,5) 

.4995 

.1963  (  .6172) 

64.0" 

.1942  (  .5769) 

64.0" 

(2,5) 

.4741 

.5969  (  .5603) 

61.0" 

.7917  (-.5657) 

58.0* 

Feed- 

100 

(2,2) 

.4758 

.5976  (  .5606) 

60.0" 

.6060  (-.3605) 

52.0 

forward 

(2,4) 

.4760 

.5962  (  .5705) 

60.0" 

.6008  (  .4831) 

57.0* 

(5,2) 

.4809 

.5236  (  .5064) 

58.0" 

.5333  (-.4856) 

50.7 

150 

(2,2) 

.4810 

.5202  (  .5728) 

58.0" 

.5339  (-.4954) 

53.3 

(4,2) 

.4815 

.5233  (  .5475) 

61.3" 

.5419  (-.5922) 

58.0" 

(1,2) 

.4976 

.2006  (  .6028) 

62.0" 

.2014  (  .5989) 

62.0" 

50 

(4,2) 

.4998 

.1989  (  .5827) 

62.0" 

.2030  (  .5024) 

60.0* 

(5,2) 

.4999 

.1993  (  .5895) 

60.0' 

.1943  (  .5680) 

62.0" 

(1,2) 

.4734 

.6014  (  .5115) 

61.0" 

.5915  (  .5629) 

55.0 

Recurrent 

100 

(4,2) 

.4760 

.6043  (  .4391) 

60.0" 

.6120  (-.4826) 

56.0 

(1,4) 

.4767 

.6066  (-.4335) 

59.0" 

2.088  (-.5628) 

50.0 

(5,2) 

.4791 

.5297  (-.4756) 

58.7" 

.5441  (-.5712) 

57:3" 

150 

(1,2) 

.4793 

.5210  (  .5869) 

60.0" 

.5202  (  .5957) 

60.0" 

(4,2) 

.4817 

.5234  (  .5289) 

60.0" 

.5299  (-.4645) 

50.7 
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Table  4,  Out-of-Sample  MSE  and  Sign  Predictions  from  Selected  Networks:  Japanese  Yen. 


Network 
Type 

Test 
Obs. 

Selected 
Network 

Recursive  Result 

NLS  Result 

PSC 

MSE 

Sign 

MSE 

Sign 

Feed- 
forward 

50 

(1,6) 
(2,6) 
(4,2) 

.4490 
.4532 
.4616 

.1178  (  .5929) 
.1181  (  .6201) 
.1153  (  .6045) 

64.0" 
64.0" 
70.0" 

.1125  (  .5886) 
.1151  (  .5495) 
.1156  (  .5878) 

60.0* 

64.0" 

66.0" 

100 

(2,3) 
(1,3) 
(4,2) 

.4732 
.4752 
.4754 

.1721  (  .4877) 
.1701  (  .5857) 
.1747  (-.4577) 

50.0 

59.0" 

54.0 

.1791  (-.5285) 
.1709  (  .5739) 
.1765  (-.4741) 

49.0 

61.0" 

50.0 

150 

(1,5) 
(1,3) 
(1,4) 

.4785 
.4815 
.4820 

.2293  (  .4430) 
.2282  (  .5111) 
.2285  (  .4945) 

58.0" 
59.3" 
58.0" 

.2265  (  .5662) 
.2320  (-.4858) 
.2281  (  .5704) 

56.7" 
59.3" 
57.3" 

Recurrent 

50 

(1,4) 
(5,2) 
(5,4) 

.4571 
.4624 
.4630 

.1227  (-.4108) 
.1124  (  .6142) 
.1203  (  .5569) 

40.0 
54.0 
50.0 

.1150  (  .6031) 
.1076  (  .5835) 
.1265  (-.5032) 

66.0" 

56.0 

52.0 

100 

(1,3) 
(4,2) 
(5,2) 

.4716 
.4740 
.4749 

.1716  (  .5313) 
.1762  (-.5320) 
.1804  (-.6110) 

57.0* 

51.0 

47.0 

.1729  (  .5421) 
.1872  (-.5783) 
.1768  (-.4779) 

55.0 
53.0 
56.0 

150 

(1,3) 
(5,4) 
(6,2) 

.4807 
.4809 
.4810 

.2262  (  .6184) 
.2291  (  .4608) 
.2345  (-.5697) 

58.7" 
58.7" 
50.0 

.2312  (-.4673) 
.2495  (-.6037) 
.2480  (-.5938) 

58.0" 

50.7 

50.7 
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Table  5.  Out-of-Sample  MSE  and  Sign  Predictions  from  Selected  Networks:  Swiss  Franc. 


Network 
Type 

Test 
Obs. 

Selected 
Network 

Recursive  Result 

NLS  Result 

PSC 

MSE 

Sign 

MSE 

Sign 

(2,5) 

.5743 

.2039  (  .6203) 

66.0" 

.2030  (  .6117) 

62.0" 

50 

(3,3) 

.5743 

.2037  (  .6104) 

60.0* 

.2009  (  .5936) 

66.0" 

(2,2) 

.5748 

.2069  (  .6273) 

62.0" 

.1965  (  .6182) 

66.0" 

(2,4) 

.5712 

.4222  (-.4563) 

55.0 

.4482  (-.5653) 

53.0 

Feed- 

100 

(3,3) 

.5718 

.4151  (  .5177) 

58.0' 

.4187  (  .4243) 

59.0" 

forward 

(2,2) 

.5723 

.4161  (  .5626) 

57.0* 

.4185  (  .4253) 

54.0 

(2,5) 

.5772 

.4228  (-.4761) 

58.0" 

.4466  (-.5871) 

56.7" 

150 

(2,2) 

.5785 

.4163  (  .5788) 

58.7" 

.4212  (-.4213) 

58.7" 

(2,3) 

.5789 

.4132  (  .5540) 

57.3" 

.4415  (-.5831) 

50.0 

(1,2) 

.5720 

.2062  (  .6271) 

62.0" 

.2105  (  .5569) 

64.0" 

50 

(1,4) 

.5738 

.2063  (  .6009) 

62.0" 

.2061  (  .5770) 

64.0" 

(4,2) 

.5750 

.2098  (  .5869) 

64.0" 

.2104  (  .5235) 

68.0" 

(1,2) 

.5696 

.4157  (  .5560) 

57.0' 

.4205  (-.4251) 

58.0" 

Recurrent 

100 

(1,4) 

.5716 

.4184  (  .4586) 

57.0' 

.4468  (-.5754) 

55.0 

(1,3) 

.5725 

.4155  (  .5736) 

57.0* 

.4373  (-.5354) 

57.0* 

(1,2) 

.5770 

.4148  (  .5828) 

58.7" 

.4146  (  .5302) 

55.3* 

150 

(1,4) 

.5802 

.4173  (  .5166) 

58.0" 

.4170  (  .5113) 

58.0" 

(3,2) 

.5806 

.4200  (  .3302) 

56.0' 

.4218  (-.4526) 

58.0" 
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Table  6.  Out-of-Sample  MSE  and  Sign  Predictions  from  ARMA  Models. 


Test 
Obs. 

ARMA 
Model 

BP 

CD 

DM 

JY 

SF 

MSE 

Sign 

MSE 

Sign 

MSE 

Sign 

MSE 

Sign 

MSE 

Sign 

50 

(0,0) 

.3884 

N/A 

.1907 

N/A 

.2099 

N/A 

.1225 

N/A 

.2157 

N/A 

(1,0) 

.3893 
( -.285) 

60.0' 

.1890* 
(  1.374) 

46.0 

.2098 
(  .023) 

48.0 

.1199 
(  .950) 

52.0 

.2163 
(  -347) 

56.0 

(0,1) 

.3897 
(  -.380) 

62.0" 

.1889* 
(  1.428) 

46.0 

.2099 
(  -.004) 

48.0 

.1202 
(  .916) 

50.0 

.2163 

(  -.377) 

54.0 

(1,1) 

.3915 

(  -.796) 

56.0 

.1885* 
(  1.524) 

46.0 

.2096 
(    144) 

46.0 

.1198 
(  .964) 

52.0 

.2159 
(-112) 

58.0 

(2,2) 

.3910 
(  -.686) 

58.0 

.1896 
(  1.071) 

44.0 

.2034 
(  1.100) 

54.0 

.1202 
(  .916) 

50.0 

.2124 
(  .859) 

52.0 

100 

(0,0) 

.5730 

N/A 

.3119 

N/A 

.6055 

N/A 

.1740 

N/A 

.4198 

N/A 

(1,0) 

.5741 
(  -.289) 

59.0" 

.3144 
(-1.015) 

43.0 

.6027 
(  .796) 

48.0 

.1708* 
(  1.632) 

56.0 

.4185 
(  .573) 

54.0 

(0,1) 

.5743 
(  -334) 

59.0" 

.3144 
(  -.970) 

44.0 

.6030 

(  .788) 

48.0 

.1708* 
(  1.624) 

54.0 

.4186 
(  -572) 

52.0 

(1,1) 

.5745 
(  -.430) 

56.0 

.3145 

(  -.865) 

44.0 

.6028 
(  .840) 

47.0 

.1709* 
(  1.625) 

56.0 

.4184 
(  .630) 

54.0 

(2,2) 

.5814t 
(-1.627) 

48.0 

.3138 

(-1.027) 

46.0 

.6001 
(  .840) 

55.0 

.1705** 
(  1.693) 

53.0 

.4190 
(  .163) 

53.0 

150 

(0,0) 

.5016 

N/A 

.4158 

N/A 

.5273 

N/A 

.2299 

N/A 

.4200 

N/A 

(1,0) 

.5023 
(  -234) 

55.3* 

.4200^ 
(-1.581) 

44.7 

.5253 
(  .934) 

50.7 

.2252** 
(  1.826) 

54.7 

.4204 
(-.174) 

49.3 

(0,1) 

.5024 
(  -.276) 

55.3* 

.4199^ 
(-1.493) 

45.3 

.5255 

(  .934) 

50.7 

.2252** 

(  1.855) 

52.7 

.4203 
(  -137) 

48.0 

(1,1) 

.5026 
(  -.363) 

52.0 

.4198^ 
(-1.350) 

45.3 

.5253 
(  .935) 

50.7 

.2254** 
(  1.764) 

55.3* 

.4209 
(  -.444) 

50.7 

(2,2) 

.5069^ 
(-1.357) 

47.3 

.4198^^ 
(-1.756) 

44.7 

.5259 
(  -241) 

51.3 

.2251" 
(  1.856) 

54.7 

.4269 
(-1.083) 

49.3 

Note: 

For  MSE 

,  t  and  tt 

'*  and  ** 

)  indicate 

the  mo 

dels  that  a 

re  signi 

ficantly  wc 

)rse  (bet 

ter)  than 

the  random  walk  model  at  10%  and  5%  level,  respectively. 
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Figure  1:  A  Simple  Feedforward  Network  with  One  Output  Unit,  Two  Hidden  Units,  and  Three 
Input  Units. 
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Figure  2:  A  Simple  Elman  (1990)  Networii  with  Hidden-Unit  Activations  Feedback. 
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